A Novel Approach for Ontology-Based Feature Vector Generation for Web Text Document Classification
نویسندگان
چکیده
Thetaskofextractingtheusedfeaturevectorinminingtasks(classification,clustering...etc.)is consideredthemostimportanttaskforenhancingthetextprocessingcapabilities.Thispaperproposes anovelapproachtobeusedinbuildingthefeaturevectorusedinwebtextdocumentclassification process;addingsemanticsinthegeneratedfeaturevector.Thisapproachisbasedonutilizingthe benefitofthehierarchalstructureoftheWordNetontology,toeliminatemeaninglesswordsfromthe generatedfeaturevectorthathasnosemanticrelationwithanyofWordNetlexicalcategories;this leadstothereductionofthefeaturevectorsizewithoutlosinginformationonthetext,alsoenriching the featurevectorbyconcatenatingeachwordwith its correspondingWordNet lexical category. Forminingtasks,theVectorSpaceModel(VSM)isusedtorepresenttextdocumentsandtheTerm FrequencyInverseDocumentFrequency(TFIDF)isusedasatermweightingtechnique.Theproposed ontologybasedapproachwasevaluatedagainstthePrincipalcomponentanalysis(PCA)approach, andagainstanontologybasedreductiontechniquewithouttheprocessofaddingsemanticstothe generatedfeaturevectorusingseveralexperimentswithfivedifferentclassifiers(SVM,JRIP,J48, Naive-Bayes,andkNN).Theexperimentalresultsrevealtheeffectivenessoftheauthors’proposed approachagainstothertraditionalapproachestoachieveabetterclassificationaccuracyF-measure, precision,andrecall.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملA New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملA Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification
In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...
متن کاملروش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملAutomatic Workflow Generation and Modification by Enterprise Ontologies and Documents
This article presents a novel method and development paradigm that proposes a general template for an enterprise information structure and allows for the automatic generation and modification of enterprise workflows. This dynamically integrated workflow development approach utilises a conceptual ontology of domain processes and tasks, enterprise charts, and enterprise entities. It also suggests...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJSI
دوره 6 شماره
صفحات -
تاریخ انتشار 2018